235 research outputs found
The Conditional Lucas & Kanade Algorithm
The Lucas & Kanade (LK) algorithm is the method of choice for efficient dense
image and object alignment. The approach is efficient as it attempts to model
the connection between appearance and geometric displacement through a linear
relationship that assumes independence across pixel coordinates. A drawback of
the approach, however, is its generative nature. Specifically, its performance
is tightly coupled with how well the linear model can synthesize appearance
from geometric displacement, even though the alignment task itself is
associated with the inverse problem. In this paper, we present a new approach,
referred to as the Conditional LK algorithm, which: (i) directly learns linear
models that predict geometric displacement as a function of appearance, and
(ii) employs a novel strategy for ensuring that the generative pixel
independence assumption can still be taken advantage of. We demonstrate that
our approach exhibits superior performance to classical generative forms of the
LK algorithm. Furthermore, we demonstrate its comparable performance to
state-of-the-art methods such as the Supervised Descent Method with
substantially less training examples, as well as the unique ability to "swap"
geometric warp functions without having to retrain from scratch. Finally, from
a theoretical perspective, our approach hints at possible redundancies that
exist in current state-of-the-art methods for alignment that could be leveraged
in vision systems of the future.Comment: 17 pages, 11 figure
CubeNet: Equivariance to 3D Rotation and Translation
3D Convolutional Neural Networks are sensitive to transformations applied to
their input. This is a problem because a voxelized version of a 3D object, and
its rotated clone, will look unrelated to each other after passing through to
the last layer of a network. Instead, an idealized model would preserve a
meaningful representation of the voxelized object, while explaining the
pose-difference between the two inputs. An equivariant representation vector
has two components: the invariant identity part, and a discernable encoding of
the transformation. Models that can't explain pose-differences risk "diluting"
the representation, in pursuit of optimizing a classification or regression
loss function.
We introduce a Group Convolutional Neural Network with linear equivariance to
translations and right angle rotations in three dimensions. We call this
network CubeNet, reflecting its cube-like symmetry. By construction, this
network helps preserve a 3D shape's global and local signature, as it is
transformed through successive layers. We apply this network to a variety of 3D
inference problems, achieving state-of-the-art on the ModelNet10 classification
challenge, and comparable performance on the ISBI 2012 Connectome Segmentation
Benchmark. To the best of our knowledge, this is the first 3D rotation
equivariant CNN for voxel representations.Comment: Preprin
Culture shapes how we look at faces
Background: Face processing, amongst many basic visual skills, is thought to be invariant across all humans. From as early as 1965, studies of eye movements have consistently revealed a systematic triangular sequence of fixations over the eyes and the mouth, suggesting that faces elicit a universal, biologically-determined information extraction pattern. Methodology/Principal Findings: Here we monitored the eye movements of Western Caucasian and East Asian observers while they learned, recognized, and categorized by race Western Caucasian and East Asian faces. Western Caucasian observers reproduced a scattered triangular pattern of fixations for faces of both races and across tasks. Contrary to intuition, East Asian observers focused more on the central region of the face. Conclusions/Significance: These results demonstrate that face processing can no longer be considered as arising from a universal series of perceptual events. The strategy employed to extract visual information from faces differs across cultures
Saccadic facilitation by modulation of microsaccades in natural backgrounds
Saccades move objects of interest into the center of the visual field for high-acuity visual analysis. White, Stritzke, and Gegenfurtner (Current Biology, 18, 124–128, 2008) have shown that saccadic latencies in the context of a structured background are much shorter than those with an unstructured background at equal levels of visibility. This effect has been explained by possible preactivation of the saccadic circuitry whenever a structured background acts as a mask for potential saccade targets. Here, we show that background textures modulate rates of microsaccades during visual fixation. First, after a display change, structured backgrounds induce a stronger decrease of microsaccade rates than do uniform backgrounds. Second, we demonstrate that the occurrence of a microsaccade in a critical time window can delay a subsequent saccadic response. Taken together, our findings suggest that microsaccades contribute to the saccadic facilitation effect, due to a modulation of microsaccade rates by properties of the background
Optimal measurement of visual motion across spatial and temporal scales
Sensory systems use limited resources to mediate the perception of a great
variety of objects and events. Here a normative framework is presented for
exploring how the problem of efficient allocation of resources can be solved in
visual perception. Starting with a basic property of every measurement,
captured by Gabor's uncertainty relation about the location and frequency
content of signals, prescriptions are developed for optimal allocation of
sensors for reliable perception of visual motion. This study reveals that a
large-scale characteristic of human vision (the spatiotemporal contrast
sensitivity function) is similar to the optimal prescription, and it suggests
that some previously puzzling phenomena of visual sensitivity, adaptation, and
perceptual organization have simple principled explanations.Comment: 28 pages, 10 figures, 2 appendices; in press in Favorskaya MN and
Jain LC (Eds), Computer Vision in Advanced Control Systems using Conventional
and Intelligent Paradigms, Intelligent Systems Reference Library,
Springer-Verlag, Berli
Adaptive Filtering Enhances Information Transmission in Visual Cortex
Sensory neuroscience seeks to understand how the brain encodes natural
environments. However, neural coding has largely been studied using simplified
stimuli. In order to assess whether the brain's coding strategy depend on the
stimulus ensemble, we apply a new information-theoretic method that allows
unbiased calculation of neural filters (receptive fields) from responses to
natural scenes or other complex signals with strong multipoint correlations. In
the cat primary visual cortex we compare responses to natural inputs with those
to noise inputs matched for luminance and contrast. We find that neural filters
adaptively change with the input ensemble so as to increase the information
carried by the neural response about the filtered stimulus. Adaptation affects
the spatial frequency composition of the filter, enhancing sensitivity to
under-represented frequencies in agreement with optimal encoding arguments.
Adaptation occurs over 40 s to many minutes, longer than most previously
reported forms of adaptation.Comment: 20 pages, 11 figures, includes supplementary informatio
Longer fixation duration while viewing face images
The spatio-temporal properties of saccadic eye movements can be influenced by the cognitive demand and the characteristics of the observed scene. Probably due to its crucial role in social communication, it is argued that face perception may involve different cognitive processes compared with non-face object or scene perception. In this study, we investigated whether and how face and natural scene images can influence the patterns of visuomotor activity. We recorded monkeys’ saccadic eye movements as they freely viewed monkey face and natural scene images. The face and natural scene images attracted similar number of fixations, but viewing of faces was accompanied by longer fixations compared with natural scenes. These longer fixations were dependent on the context of facial features. The duration of fixations directed at facial contours decreased when the face images were scrambled, and increased at the later stage of normal face viewing. The results suggest that face and natural scene images can generate different patterns of visuomotor activity. The extra fixation duration on faces may be correlated with the detailed analysis of facial features
Local biases drive, but do not determine, the perception of illusory trajectories
When a dot moves horizontally across a set of tilted lines of alternating orientations, the dot appears to be moving up and down along its trajectory. This perceptual phenomenon, known as the slalom illusion, reveals a mismatch between the veridical motion signals and the subjective percept of the motion trajectory, which has not been comprehensively explained. In the present study, we investigated the empirical boundaries of the slalom illusion using psychophysical methods. The phenomenon was found to occur both under conditions of smooth pursuit eye movements and constant fixation, and to be consistently amplified by intermittently occluding the dot trajectory. When the motion direction of the dot was not constant, however, the stimulus display did not elicit the expected illusory percept. These findings confirm that a local bias towards perpendicularity at the intersection points between the dot trajectory and the tilted lines cause the illusion, but also highlight that higher-level cortical processes are involved in interpreting and amplifying the biased local motion signals into a global illusion of trajectory perception
Local biases drive, but do not determine, the perception of illusory trajectories
When a dot moves horizontally across a set of tilted lines of alternating orientations, the dot appears to be moving up and down along its trajectory. This perceptual phenomenon, known as the slalom illusion, reveals a mismatch between the veridical motion signals and the subjective percept of the motion trajectory, which has not been comprehensively explained. In the present study, we investigated the empirical boundaries of the slalom illusion using psychophysical methods. The phenomenon was found to occur both under conditions of smooth pursuit eye movements and constant fixation, and to be consistently amplified by intermittently occluding the dot trajectory. When the motion direction of the dot was not constant, however, the stimulus display did not elicit the expected illusory percept. These findings confirm that a local bias towards perpendicularity at the intersection points between the dot trajectory and the tilted lines cause the illusion, but also highlight that higher-level cortical processes are involved in interpreting and amplifying the biased local motion signals into a global illusion of trajectory perception
- …